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MYKHAYLO SHKOLNIKOV 

Abstract. We obtain universal estimates on the convergence to equilibrium 
and the times of coupling for continuous time irreducible reversible finite- 
state Markov chains, both in the total variation and in the L 2 norms. The 
estimates in total variation norm are obtained using a novel identity relating 
the convergence to equilibrium of a reversible Markov chain to the increase 
in the entropy of its one-dimensional distributions. In addition, we propose 
a universal way of defining the ultrametric partition structure on the state 
space of such Markov chains. Finally for chains reversible with respect to 
the uniform measure, we show how the global convergence to equilibrium 
can be controlled using the entropy accumulated by the chain. 



I. Introduction 

Recently, the convergence to equilibrium of slowly mixing Markov chains 
appearing in statistical physics has attracted much attention. In this framework 
continuous time irreducible reversible Markov chains are defined by choosing the 
transition rates from a state (usually, a spin configuration) a to a state (spin 
configuration) b to be proportional to e~^ E ^~ E(fi '' + , where E is an energy 
functional and stands for the inverse temperature, which in this context 
is chosen to be large: > 1. In this low temperature regime, a recurring 
feature is that the energy landscape given by E divides the state space into 
sets of metastable (or, stable) states, which are separated by potential wells. 
The convergence to equilibrium of the corresponding Markov chain, started 
in a metastable state, is then governed by the time it takes to overcome the 
respective potential wells in order to reach the part of the state space with the 
lowest energy. 

The potential theoretic approach to metastability developed in the articles 
[7], [8] and [9] (see also the excellent summaries [5] and [6]) has been used 
to obtain precise information on metastable transitions for reversible Markov 
chains associated with several models of statistical physics. These include cer- 
tain disordered mean field models (see [8]) and, more specifically, the Curie- 
Weiss model with a random field taking finitely many values (see [8J, [3] and 
[1]). Other examples of slowly mixing reversible Markov chains, in which the 
metastable behavior has been analyzed in detail, include the Glauber dynam- 
ics for the two-dimensional Ising model on a torus and its generalizations (see 
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[T5] and [19|), the three-dimensional Ising model on a torus (see [2]) and the 
classical Curie- Weiss model (see p3] and [17]). Moreover, the first exit problem 
from a domain for reversible chains with exponentially small transition proba- 
bilities was studied in the article [20]. In a different but related line of research, 
initiated by the article [16J, the metastable transitions are studied for diffu- 
sions with a small diffusion parameter, which are confined in a potential having 
several local minima (see [10] and [11] for a recent account on this problem). 

Here, we take a different viewpoint. Instead of analyzing a specific Markov 
chain in detail, we try to understand some universal aspects of the ultrametric 
structure, that is, the presence of multiple time scales in a general irreducible 
reversible finite-state Markov chain. We obtain universal estimates on the con- 
vergence to equilibrium and the times of coupling in this abstract framework. 
We prove such results both in the total variation norm and in the L 2 norm. In 
the case of the total variation norm, we utilize a novel entropy identity relating 
the convergence to equilibrium of the chain to the increase of the entropy of its 
one- dimensional distributions. 

In addition, we propose a universal way of defining the ultrametric partition 
structure, that is, a sequence of partitions of the state space corresponding to 
the different time scales on which convergence to equilibrium occurs. Finally, 
in the case that the chain is reversible with respect to the uniform measure, we 
show how the entropy of the one-dimensional distributions can be utilized to 
control the global convergence to equilibrium of the chain. 

To give examples of the type of results we obtain, we introduce a set of no- 
tations. Let X be a continuous time irreducible reversible Markov chain on a 
set / = {1,2, . . . ,n} of n elements. Moreover, for an a G / and a t > let 
P" be the one-dimensional distribution of the chain at time t, when started in 
a. Finally, write ||.||rv for the total variation norm, v for the invariant distri- 
bution of X and let H(u) be the entropy — Ylaei z/ ( a ) 1°§ u ( a ) °^ ^ ne invariant 
distribution v. 

Before stating our first result rigorously, we would like to provide the reader 
with some intuition by giving an example. Fix natural numbers M, N > 3 and 
consider the graph given by an arrangement of M iV-cycles in a cycle of size 
M. Now, let X be the continuous time Markov chain on this graph, which has 
a transition rate p\ > for neighboring vertices belonging to the same iV-cycle 
and a transition rate p2 > for neighboring vertices belonging to different N- 
cycles. Since the generating matrix of this Markov chain is symmetric, it is 
reversible with respect to the uniform distribution on the set of vertices of the 
graph. Next, suppose that p% is much smaller than p\. Then, it is intuitively 
clear that, for every vertex a of the described graph, the quantity \\Pg t — P®\\tv 
can be only large on the two disjoint time intervals, during which the Markov 
chain mixes on the iV-cycle containing a and on the M-cycle comprised by the 
M iV-cycles, respectively. Under the scale-invariant measure, which has the 
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density | on the time axis [0, oo), the union of these two time intervals has a 
total measure of order logiV + logM = log(MiV). Thus, it is logarithmic in the 
size of the state space of X. The purpose of Theorems [T] and [3] below is to show 
that the latter property is universal for continuous time irreducible reversible 
Markov chains, in the sense that the order of magnitude in this example is an 
upper bound on the size of the corresponding quantity for a general reversible 
Markov chain. 

Theorem 1. Let X be a continuous time irreducible reversible Markov chain 
on the set I = {1,2, . . . ,n} and let v be its invariant distribution. Then, the 
following is true. 

(a) For every 5 > 0, there is a constant C(S) > depending only on 5 (and not 
on n or the particular Markov chain) such that 

poo 1 

(1.1) £V(a) / l {t > 0: llPSt - mTV > s} -dt<C (5) H(u). 

In particular, for every 5, e > 0, there exists a constant C e (5) > depending 
only on 5 and e (and not on n or the particular Markov chain) such that 

(1-2) v fya € / : J l {t > : \\if t -if\\Tv>S} \ dt > C e {S) H(u)j < e. 

(b) For every 5 > 0, there is a constant C(5) > depending only on 5 (and not 
on n or the particular Markov chain) such that 

(1.3) K a M 6 ) / 1 {t>o:\\p--pn T v-\\PS t -PS t \\Tv>s} J dt ^ C(6)H(u). 

(a,b)el 2 

In particular, for every 5, e > 0, there exists a constant C e {fi) > depending 
only on 5 and e ( and not on n or the particular Markov chain ) such that 

(1-4) 

iy x v ) u a > h ) e /2 : J 1 {t>o:\\p--pn T v-\\ps t -pi t \\ T v>s} \ dt ^ H ( u ^j < e - 

We remark at this point that universal estimates as in Theorem [T] can only be 
obtained under the scale-invariant measure | dt on the time axis [0, oo), which 
has the property 

rt2 i i"nt2 i 

(1.5) / -dt= -dt 



Jt! t Jriti * 

for all 7] > and < t± < £2 < 00. This can be easily seen by slowing down or 
speeding up the chain by a constant factor. 

To give an example of a result in the framework of L 2 convergence, a set of 
auxiliary notations is needed. For simplicity, we assume for the moment that 
X is irreducible and reversible with respect to the uniform distribution on /. 
In this case, writing C for the generating matrix of X, we can conclude that 
the matrix — C is symmetric and admits an orthonormal basis of eigenvectors 
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v i, i> 2 , . . . , v n corresponding to eigenvalues = A x < A 2 < A 3 < . . . < X n . Fixing 
a pair of initial states (a, b) and letting e a (resp. eh) be the vector, whose only 
non-zero component is the a-th one (resp. the 6-th one) and equals to 1, we 
have the decomposition 

n 

(1-6) e a - e b = y^/i; Vj. 

1=2 

Finally, we define the set 

(1.7) A(a,b) :={0, f S 2 ,vl + t4,...,t4 + f4 + ... + tJ 2 n } C [0,2] 
and a family of its neighborhoods 

(1.8) A 5 {a, b) := [0, 8/4] U [(1 - 8)f4, /ij + 6f4\ U . . . U [2 - 8f£, 2] 

for 5 G ^0, |J, and write ||.|| 2 for the L 2 norm with respect to the counting 
measure on /. 

Theorem 2. In the setting just described the following is true. For every 
8 G ^0, |V there is a constant K{5) > such that 

(1-9) /if „ iliaj x , A-dt < K(8)n 

V ' J [t>0: \\P?-P*\\l$AS{a,b)) t ~ V ' 

for all pairs of initial states a, b. The constant K(8) depends only on 5, but not 
on a, b, n or the particular Markov chain X. 

The rest of the paper is organized as follows. In section 2.1, we prove a 
stronger version of Theorem [T] in the case that the invariant distribution v is 
the uniform distribution on I. In order to do this, we show a novel entropy 
identity (see Lemma [4]) allowing us to relate the increase in the entropy of 
the one-dimensional distributions of the Markov chain to the convergence of 
the chain to its equilibrium. In section 2.2, we prove Theorem [T] by suitably 
adapting the entropy identity of Lemma|4]to the general setting. In section 3.1, 
we give a global (or, averaged) version of Theorem [2] and present the proof of 
Theorem [2j Subsequently, we explain in section 3.2, how Theorem [2] extends to 
general continuous time irreducible reversible finite-state Markov chains. Then, 
in section 4, we present a universal way of defining the ultrametric partition 
structure on the state space of a continuous time irreducible reversible finite- 
state Markov chain. Finally, in section 5, we show in the case that the chain 
is reversible with respect to the uniform distribution, how the entropy of the 
one- dimensional distributions of the chain can be used to obtain a control on 
the global convergence of the chain to its equilibrium. 

2. Estimates in total variation norm 

In this section we give a control on the convergence to equilibrium and the 
times of coupling with respect to the total variation norm by analyzing the 
change in the entropy of the Markov chain over time. 
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2.1. Markov chains reversible with respect to the uniform distribu- 
tion. The following theorem is a stronger version of Theorem [T] for the special 
case of Markov chains, which are reversible with respect to the uniform distri- 
bution. 

Theorem 3. Consider the setting of Theorem^ and assume, in addition, that 
the Markov chain X is reversible with respect to the uniform distribution on 
I — {1, 2, . . . , n}. Then: 

(a) There is a constant C{5) > depending only on 5 (and not on n or the 
particular Markov chain) such that for all initial states a of the Markov 
chain: 

f°° 1 
(2-1) / l{t>o: \\P? t -P t a \\Tv>8} 7 dt < C(5) logn. 

Jo 1 

(b) There is a constant C(d~) > depending only on 5 (and not on n or the 
particular Markov chain ) such that for all pairs (a, b) of initial states of the 
Markov chain: 

f°° 1 
(2-2) J l {t >„. [|J»-P»|| 3V -||f»-J* Hrv^i} 7 dt < C ( 6 ) lo & n - 

The proof relies on the following entropy identity. 

Lemma 4. Let X{t), t > be a Markov chain as in Theorem^ started in an 
initial state a £ I. Then, for all t > 0: 

(2-3) H{P& - H(P t a ) = H(P t a 2t \P 3 l 2t ). 

Hereby, H(.\.) stands for the relative entropy and P£ s stands for the law of the 
random vector (X(u),X(s)). In particular, the inequality 

(2-4) ||P t a - P 3 a t \\tv < y/2{H{P&) - H(P?)) 

holds for all t > and all initial states a £ I. 

Proof of Lemma |4j We start the proof with the following elementary com- 
putation, which only relies on the Markov property of X: 



H(P 2 a t ) - H(P t a ) = J~ (i) fog +£j*(i)to g j*(i) 

iei iei 

= -E^'){(E*i io g J5(o) -io g j?o")} 



jei iei 



E w) E { ] °g - lo § w) } 



jei iei 

We now exploit the symmetry of the transition matrices of the Markov chain 
X (which is due to the reversibility of the uniform distribution and the detailed 
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balance condition) to deduce 

(2-5) ^(0^(0 = ^(0^10 

for all G I 2 . In addition, the Markov propery of X yields 

(2.6) P^)PIU) = P a (X(2t) = i,X(3t) = j), 

(2.7) P t a (j) P/(0 = F a (X(t) = j,X(2t) = 

for all G I 2 . Putting the latter three observations together, we end up 

with the lemma. □ 

In the proof of Theorem [3] we will need the following simple calculus lemma. 
Lemma 5. Let g : R — > [0, oo) be a non- decreasing function, which satisfies 

(2.8) lim g(u) = p, lim g(u) = q 

u— >— oo u— >oo 

for some non-negative real constants p < q. Then, for every r > and e > 0, 
one has the inequality 

r°° r(^q — p) rq 

(2.9) j l{g(u+r)-g(u)>e} du < < — . 



— oo 



Proof of Lemma [5} It suffices to observe the elementary inequality 
, 91fVl ! </(u + r) - ff(tx) 

which leads to 

r°° i / r K+r r K \ r(q-p) 

/ 1 {g(u+r)-g(u)>e} du < - lim / £>(«) du - / #(«) d« = 

J-oo eK->oo\j_ K+r J_ K / e 

and, hence, yields the lemma. □ 

We are now ready for the proof of Theorem |3j 

Proof of Theorem [3j First, we note that part (b) of the theorem is a conse- 
quence of part (a) due to the inequalities 

/o i i \ II pa p&ll || pa r>b || ^ || pa pa || , || r>b ryb || 

{2.11) - r t \\ TV - \\r u - r u \\Tv S \\r t - r u \\Tv + \\^ t - ^ t \\Tv 

and 

(2.12) lq P a_p*\\ TV+l \pb_pb^ TV > s} < \\ TV >S/2} + l{\\pa-P* t \\ TV > S /2}- 



We turn now to the proof of part (a). Due to the inequality (2.4), it suffices to 
prove that for every 5 > there is a constant C(5) > depending only on 5 
(and not on a, n or the Markov chain X) such that 

f°° 1 

(2.13) / l {t > 0: H{P a t) _ H{P a )m -dt< C(6) logn. 

J 

Introducing the function g : K — > [0, oo), g(u) = H(P°i), we can rewrite the 
latter inequality as 

POO 

(2.14) / 

l{u: g(«+log2)-3(u)>o} — C(^) log 77 -- 
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Noting that lim^oo g(u) = lim^oo H(P°) = logn (since the uniform distri- 
bution is the unique stationary distribution of X), we see that the desired 
inequality holds with C (5) = as a consequence of Lemma 



□ 



2.2. General reversible Markov chains. In this subsection we consider a 
general continuous time irreducible reversible Markov chain I on I and will 
prove Theorem [Tj To start with, we recall the detailed balance condition: 

(2.15) u(a) F a (X(t) = b) = v{b) F b (X(t) = a), 

which holds for all times t > and all pairs of states (a, b) e J 2 . We now give 
the proof of Theorem [TJ 

Proof of Theorem [TJ The first assertion in part (b) of the theorem is a 



direct consequence of the inequality (2.12) (which clearly remains true in the 



more general setting of the present theorem) and the first assertion in part (a) 
of the theorem. Moreover, the second assertions in both parts of the theorem 
follow from the first assertions in the corresponding parts of the theorem and 
Markov's inequality. For these reasons, we only need to prove the first assertion 
in part (a) of the theorem. 

To this end, we fix an initial state a£l and note that the same computation 
as in the proof of Lemma [4] above yields: 

H(p 2 a t ) - H(pn = j2 p^pmog ^l^ 

As before, we have by the Markov property 

(2.16) PtU)Pi(i) =V a {X{t) = j,X(2t) =i). 



Moreover, the detailed balance condition (2.15) gives 



(2.17) P£(i) Pi(i) = P*(i) Pi(j)^ = V*{X(2t) = i, X(3t) = 

Plugging this in, we get 
(2.18) 



H(P 2 a t )-H(P t a ) = H(P« 2t \P.l 2t )+ P a (X(t)=j,X(2t) = z)log 



viz) 



where P^ 2t and P^ t2t denote the laws of the random vectors (X(t),X(2t)) 

'(0 _ 



and (X(3t), X(2t)), conditioned on X(0) = a. In addition, writing log — 



\ogv(i) — \ogu(j) and summing, we obtain 

H{P& - H(P t a ) - H(P"\P^ 2t ) 



J2P a (X(2t) = i) tog - ^P a (X(t) = i) log !/(;). 
iei jei 
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Finally, integrating both sides of the latter equation with respect to u and using 
the fact that v is the invariant distribution of the Markov chain X, we end up 
with the averaged entropy identity 

(2.19) J>(a)(tf(P 2 t ) -H(P?)) =X>( a )#( P ^l P 3°, 2t )- 

ael ael 

In particular, this implies the inequality 

(2.20) Y,^( H ( P 2 a t)-H(P t a )) >lj2^\\ P t - p st\\TV 

ael ael 



On the other hand, the first inequality in part (a) of the theorem is equivalent 



to 



/oo 
l{ueR: \\P SeU ~P^\Tv>s} du < C(5) H(u). 
... . -oo 



ael 

This in turn would follow from 1 
prove 



1 1 pa pa 1 1 2 

IK 3e w y II TV 



{ueR: \\P^u-P*u\\tv>5} < — 



if we can 



(2.22) 



oo II pa pa || 2 

\\ r 3e u r e u \\TV 



du < C{8)H{v). 



However, due to the estimate (2.20), the left-hand side in the latter inequality 
is bounded above by 



5>(«) 

ael 

2 



2{H{P^)-H{P^)) 



£V(a)liin / H(P^)du- H{P^)du 

^ K^oo \J_ K+log2 J_ K 



du 



2\og2H{u) 

I 2 



This finishes the proof. 



□ 



3. Estimates in L 2 norm 

Throughout the first subsection of this section, we assume for the simplicity of 
notation that the continuous time Markov chain X is irreducible and reversible 
with respect to the uniform distribution on the set I = {1,2, . . . ,n}. We first 
give a global version of Theorem [2] in Theorem [6] and then prove Theorem [2] 
at the end of the first subsection. Subsequently, in the second subsection, we 
give the analogues of these results for a general continuous time irreducible 
reversible Markov chain on /. 
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3.1. Markov chains reversible with respect to the uniform distribu- 
tion. In the following theorem we show that, for most of the time on the 
scale-invariant clock, the square of the L 2 distance between the one-dimensional 
distributions of the Markov chain started in a and the one-dimensional distri- 
butions of the Markov chain started in b, averaged over all pairs (a, b) G I 2 , 
stays close to the lattice 

(3.1) A L :=\0, 



'i i i • • • i 
n n n 



This statement can be viewed as a global (or, averaged) version of Theorem 
[2J To make this statement precise, we write A S L for the ^-neighborhood of Ai 

where 5 is a number in (0, |), and can formulate the following 



111 



0. 



2(n-l) 



result. 



Theorem 6. In the setting of Theorem^ for all < 5 < \, there exists 
constant K(d~) > such that the estimate 



<3 - 2) rw^^^^^" 

holds. Hereby, the constant K(5) depends only on 5, and not on n or the 
particular Markov chain X . 



Proof. To start with, we recall the notation C for the generating matrix of the 
Markov chain X, so that, in particular, the transition matrix P t corresponding 
to a time t > is given by e tc . Since X is irreducible and reversible with respect 
to the uniform distribution, the matrix — C is symmetric and non-negatively 
definite and has the eigenvalues = Ai < A2 < A3 < . . . < A n . In particular, 
each of the matrices Pt, t > is symmetric, positively definite and has the 
eigenvalues 

(3.3) 1, e~ M \ e" A3 \ . . . , e~ Xnt . 

Writing ||.|| 2 for the L 2 norm with respect to the counting measure on / and 
(., .) 2 for the corresponding scalar product, we can make the following compu- 
tation: 

J2 \\Pt-Pt\\l = 2nJ2\\Pt\\l-2 J2 ^ P ^ P th 

(a,b)&I 2 a£l (a,b)el 2 

= 2n V a (X(t) = c) 2 -2(j2 P t a >J2 P t 

(a,c)G/ 2 a£/ bel 

= 2n (e~ 2X2t + e~ 2Xst + ... + e ~ 2Xnt ) , 
which is valid for all t > 0. 

Next, we set 

(3.4) fit) := e~ 2X2t + e~ 2Xst + ... + e~ 2Xn \ 

(3.5) t k := inf{t > : f(t) <k-5}, k = 1, 2, . . . , n - 1. 
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The continuity of the function / implies /(i^) = k — 5. In particular, it follows 
that 

(3.6) f(2t k )< m a \ A n , (xi + xl + .-. + xl^). 

x 1 +x 2 +--.+x n -i=k-b, 0<Xi<l 

Moreover, since the maximum of the convex function 

(xi, . . . , x n _i) i y (x\ + x\ + . . . + x^—i) 

is taken over a convex set, it must be attained at a boundary point of that set. In 
other words, at the optimizing point it must hold xi G {0, 1} for at least one 1 < 
I < n — 1. Eliminating the corresponding variable, we obtain a maximization 
problem of the same type and conclude that at least one another coordinate 
x\i has to belong to the set {0, 1}. Proceeding with the same argument, we 
conclude that for each optimizing point (xi, X2, ■ ■ ■ , x n -i), there must be (k — 1) 
coordinates, which are equal to 1, (n — k — 1) coordinates, which are equal to 
0, and one coordinate, which is equal to 1 — 5. Thus, we have: 

(3.7) f(2t k ) < f(t k ) - (1 - 5) + (1 - 5f = f(t k ) - 5(1 - 5). 

Now, either f(2t k ) < (k — 1) + 8, or we can proceed with the same argument 
to conclude 

(3.8) f(4t k ) < f(2t k ) - 5(1 -5)< f(t k ) - 25(1 - 5). 
Proceeding further with the same argument, we end up with 

(3.9) f(2 R t k ) < f(t k ) - (1 - 25) = (k - 1) + 5 



for R 



1-28 



where [".] denotes the closest integer from above. 



5(1-5) 

Hence, setting 

(3.10) t k = inf{t > : f(t) < (k - 1) + 5}, k = 1, 2, . . . , n - 1, 

we have the estimate 



oo 



1 . . t 



(3.11) / l Mfc] Jdt = log^<log2 =:K(5). 

JO 1 l k 

Finally, using this and the identity 



1-25 



5(1 - 5) 



dt, 



f°° 1 _ r 1 

(3.12) Jo 1 {^.^ iatb)el2 \^-pm^i}i dt -^J 

we readily obtain the theorem. □ 

Remark 1. We note at this point that the main result of the article [15] implies 
that, for any vector (1, A2, A3, ... , A n ) with 

(3.13) 1 > A 2 > A 3 > . . . > A„ > 0, 

there is a symmetric doubly stochastic matrix S with eigenvalues 1, A2, A3, . . . , A n . 

In particular, one can find a matrix C = S — Id with the following two 
properties: 
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(a) £ generates a continuous time Markov chain, which is irreducible and 
reversible with respect to the uniform measure. 

(b) The matrix — £ has the eigenvalues 

(3.14) (0, A a , A 3 , . . . , A n ) = (0, 1 - A 2 , 1 - A 3 , . . • , 1 - A n ) 



for a given sequence A 2 , A3, ... , A n as in (3.13). 



This together with the proof of Theorem [6] shows that the order n of the 
upper bound in Theorem [6] is optimal. As will become clear from the proofs 
below, the same is true for the upper bound of Theorem [2j and the counterparts 
of these results for general continuous time irreducible reversible Markov chains 
treated in section 3.2. 



We proceed with the proof of Theorem [2} 

Proof of Theorem [2j To start with, we introduce the following notations: 

fc-i 

t k (a, h) := inf [t > : \\P t a - P b t \\l < + (1 - 8)nty, k = 2, . . . , n 

1=2 
fe-i 

n 



t k (a,b) :=inf{t>0: ||P t a - P t 6 ||^ < J^tf + 5 t4\, k = 2 

1=2 

and note that 

(3 ' 15) I HeoHip,.-p»,«w»}7* = g log MM)- 

From now on, we fix a k = 2, 3, . . . , n and will show that 

(3.16) l ^rr^^ K ^ 

t k (a,b) 

for a suitable constant K{$) > 0, which depends only on S (but not on a, b, k 
or n). To this end, we note that the identity 

n 

(3.17) K-PtWl^rfe-**, t>® 

1=2 

and the inequality < A 2 < A 3 < . . . < A n imply the estimate 
(3.18) 

n 

W P 2t k (a,b) ~ P 2t k (a,b)\\i < p^na^ ^ ^ ^ ^E^ 1 

Moreover, if we have fif > for all I = 2, 3, . . . , n, then the function 

n 

(x 2 ,x 3 , ...,x n )^ y^fal 
1=2 



12 



MYKHAYLO SHKOLNIKOV 



is stricly convex and must attain its maximum at a vertex point of the convex 
polyhedron 

n 

{(x 2 ,X 3 ,...,X n ) : ^2tfx t = \\Pt k (a,b)- P t k (a,b)W 2 ^ 1 > ^2 > ^3 > • • • > > } . 
1=2 

If we have /if — for some I G {2, 3, ... , n}, then we can elimininate the 
corresponding coordinate in the maximization problem and make the same 
conclusion for the reduced maximization problem. For this reason, we may 
assume without loss of generality that fif > for all / = 2, 3, . . . , n. Moreover, 
since the hyperplane 'Y^i=2l J n x i = \\ P t k (ab) ~ P t k {ab)^2 * s ( n — 2)-dimensional, 
the vertices of the polyhedron above are given by points 1 > x 2 > 2:3 > . . . > 
x n > 0, for which (n — 2) of the inequalities 

1 > x 2 , x 2 > x 3 , . . . , x n > 

are in fact equalities. 



Thus, each optimizing point of the maximization problem above can be de- 
scribed as follows: There is a partition of {2, 3, ... , n} into three sets I± : I 2) I3 
of the form {2,3, . . . {k + l,/i + 2, . . . ,/ 2 }, {k + 1, k + 2, . . . , n}, respec- 
tively, such that, for all I G I± it holds xi — 1, for all I G I 2 we have xi = ( for 
a suitable ( G [0, 1], and for all I E I 3 it holds X/ = 0. Moreover, the identity 



(3-19) \\P? k{ajb) - P b tk{afi) \\l = + (1 - % 



fe-i 

2 

Z=2 



shows that the value of ( is given by 

{3 20) c = (Ea^ 2 + (i-^)-(E^ 1 ^) ; 

E«e/ 2 ^ 

and that k C {2, 3, . . . , /c — 1}. To proceed, we introduce the set 
(3.21) J 2 := ({2,3,...,A;-1}\7 1 ) C I 2 

and conclude 

(E te / a ^) + 

E* e / 2 

This allows us to make the following computation: 

ll- P 2t fc (a,&) ~~ ^2t fc (a,6)ll2 = 1 1 -^(0,6) ~~ P t k {a,b) Wl ~ ( C) + ( ^ ( 2 ) 

ze/ 2 «e/ 2 

( E, e / 2 + (i - <«) ( E, eM / 2 A - (i - <V fc ) 



(3.22) C= V eJ2 ^ 7 , 2 — , / 2 C/ 2 . 



II pa _ pft II 2 

- \\ r t k {a,b) r t k {a,b) II 2 



EiG/ 2 A*? 
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Next, we note that the latter fraction is of the form = y^r, whereby: 
A > (1 - 5)/4 and B>8/4. Thus, 

II pa pb || 2 ^ || pa pb ||2 

\\ r 2t k {a,b) r 2t k {a,b)\\2 ^ \\ r t k {a,b) r t k {a,b)\\2 1 , 1 



_ || pa pb 1 1 2 A^fc 

— Il-^tfc (tx,6) - f f fe (a > 6) 112 1 r 

1-5 

Proceeding with the same argument, we conclude that 

(3.23) \\P2 R t k {a,b) ~ •^2-a* fc (tt,6) III — ll^tt(a,6) ~~ -^tfe(a,6) III ~ ~~ 25) 

for all natural numbers R > \~ 2l5 1 . In particular, we conclude that 

r i=gj 1 

(3.24) 4(a,6) < 2 1 t^ + 5 't fe (a,6), 



where [.] denotes the closest integer from above. This shows the claim (3.16) 
with 



(3.25) AT(5) = log2 



1-28 

-J L I 

1-8 ^ S 



and finishes the proof. □ 

3.2. General reversible Markov chains. We proceed with the analogues of 
Theorems [2] and [6] for a general continuous time irreducible reversible Markov 
chain X. To state the results, we introduce the following set of notations. We 
write v for the invariant measure of X as before, and let D be the diagonal 
matrix, whose diagonal entries are given by i E I. Then, by the detailed 
balance condition (2.15), the matrix D 1 / 2 P t D~ l l 2 is symmetric for all t > 0. 
Moreover, since the matrices D l l 2 P t D~ l l 2 , t > commute, they have a joint 
orthonormal basis of eigenvectors Vi,v 2 , ■ ■ ■ ,v n corresponding to sets eigenvalues 

(3.26) 1 > e~ X2t > e- x ' il > . . .> e~ Xnt , t > 0, 

respectively (see chapter 3 of the book [Tj for more details). In addition, for 
any fixed pair (a, b) of initial states, we let 



(3.27) D -y^e a - eb ) = J2 



AW 



1=2 



be the expansion of the vector D' 1 ^ 2 (e a — e ) in terms of the basis v\, t>2, • • • , v n 
(note that the vector D~ 1 ^ 2 (e a — ei ! ) is orthogonal to the eigenvector correspond- 
ing to the eigenvalue 1 of the matrices D l l 2 P t D~ l l 2 , t > 0). Finally, define the 

sets 

i L :={0,l,2,...,n-l}, 
A(a, b) := {0, p%, $ + fil, . . . , p% + frl + . . . + jx 2 n } } 
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and their neighborhoods 

A S L := [0, 5] U [1 - 5, 1 + 5] U [2 - 8, 2 + 5] U . . . U [n - 1 - 5, n - 1] , 

n— 1 n 

A 5 (a, b) := [0, 6$] U [(1 - p* + ^] u . . . U [ £ + (1 - ^ /2f] , 

Z=2 Z=2 

< 5 < |. With these notations, the analogues of Theorems [2] and [6] read as 
follows. 

Theorem 7. Let W-Wl 2 ^- 1 ) an d i-,-)^^- 1 ) be the L 2 norm and scalar product 
with respect to the weights vii)' 1 , i E I. Then, for all < 5 < \, there is a 
constant K{$) > such that the inequalities 

f°° 1 
(3-28) lr , N/Nll .„, 1 - dt < K(S) n 

and 

r°° 1 

ZioZd tree. The constant K(5) depends only on 5, and not on n or the particular 
Markov chain X. 



Proof. In order to prove (3.28), we use the fact that v is the invariant distri- 
bution of the Markov chain X to deduce the identities 

v(a)v(b)\\P t a -P t b \\l^) 

(a,b)el 2 

= 2 k«) mi V 1 ) - 2 < E ^ p ^ E < h ) p ") 

a£l aEl bel 

= 2 J2 K«) E *?( c )Mc) -1 " 2 <^ " W" 1 ) 
= 2 £ u{a)P?{c)>v{c)- 1 -2, 

(a,c)e/ 2 

which hold for all t > 0. Moreover, the latter sum is given by the sum of squares 
of the entries of the matrix Z) 1 / 2 PfD _1 ' /2 and is, hence, equal to 1 + ^"=2 e 2A '*- 
Thus, 



(3.30) »(a)^)\\ P t a -P{\\h(»-i) = Y, e ~ 2Xlt 

(a,b)el 2 1=2 

From this point on, one can proceed as in the proof of Theorem [6] to show 
d3~28l. 



Now, we turn to the proof of (3.29). To this end, we note that the detailed 
balance condition (2.15) implies DP t = P^D, t > 0, where the superscript T 
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stands for the transpose of a matrix. This allows us to make the computation 
P? ~ P h t = ((e. - e h ) T P t ) T = Pj{e a - e b ) = (D 1 / 2 P t D~^) (e a - e b ) 

n n 
1=2 1=2 

for all t > 0. Next, we observe that the vectors D l l 2 v i, D 1 ^ 2 V2, ■ ■ ■ , D x l 2 v n form 
an orthonormal basis with respect to the scalar product (., -)l2i v -x\, since the 
vectors v\, V2, ■ ■ ■ , v n form an orthonormal basis with respect to the standard 
Euclidean scalar product. Hence, 



n 



(3.31) \\Pt-P b t\\h { ^ ) = Y.^ e ~ 2Xl ^ 



1=2 



From this point on, one only needs to follow the arguments in the proof of 



Theorem 2 to end up with (3.29). □ 



Remark 2. It is worth noting that the estimates of Theorems |2j [6] and [7] hold 
for 

W P ?-v\\l \Y,W P t-^\l 2^z/(«)ll^ a -^llVi) and \\P t a - vf L ^ Y) 

a£l ae/ 

in place of 

\\Pt-Pt\\i 4 E ii^-^ii*. E ^ a H b )\\Pt - Pt\\% { ^) 



(a,b)eP (a,b)el 2 



and 11^-^11^, 



respectively. The same proofs apply, with the only difference being that one 
needs to expand the vectors (e a — u) and _D~ 1//2 (e a — u) in terms of an orthonor- 
mal basis of eigenvectors of the matrices Pt, t > and D 1 / 2 P t D~ 1 / 2 , t > 0, 
respectively. 

4. A UNIVERSAL APPROACH TO THE ULTRAMETRIC STRUCTURE 

In this section we provide a univeral way of defining the ultrametric partition 
structure on the state space I = {1, 2, . . . , n} of a continuous time irreducible 
Markov chain X, which is reversible with respect to its invariant distibution v. 
Typical examples of such chains are encountered in statistical physics, where 
often the transition rate for a pair (a, b) of neighboring states is proportional 
to e~^^ E ^~ E( - a ^ + with E being an energy functional (see the references given in 
the introduction, as well as the references therein). For large values of (3, the 
energy landscape naturally provides a partition of the state space into states 
of different types, which are separated by potential wells (see Figure 1 for a 
schematic diagram). 

Here, we will give a universal way of defining the partition structure without 
making use of the explicit knowledge of the transition rates. Thereby, each of 
the partitions will correspond to a time scale on which convergence to equilib- 
rium occurs for the Markov chain in consideration. For this purpose, we let 
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Potential well 




States of 
tjpe 1 



Figure 1 . A schematic diagram of an energy landscape 

|.| be any norm on the space of finite measures on the set /, which is normal- 
ized in such a way that ||7i"i — 7r 2 || < 1 for any two probability measures tti, 
7r 2 on I. Moreover, we assume that the function t i— > \\Tt\Pt — ^2-^*11 is strictly 
decreasing on [0, oo) and tends to zero in the limit t — > oo for all probability 
measures t\\ ^ 7r 2 on / (hereby, the products TCiPt, vr 2 Pt should be understood 
in the sense of multiplication of a probability measure by a stochastic kernel). 
Examples of such norms are the appropriately normalized total variation and 
L 2 norms discussed above. 

Now, we fix an < e < 1 and will recursively define equivalence relations 
~i, ~ 2 , ... on J, which will induce the desired sequence of nested partitions. To 
define ~i, we set 

(4.1) t\ = min inf{t > : ||P t a - u\\ + ||P f 6 - u\\ < e}, 

(4.2) h£\t) = \\P t \ -is\\ k - \\P?-v\\, iart = kt* 1 + 8,0<8<tl,ael. 
Then, we let a ~i b iff either a = b, or 

(4.3) lim sup ^ log [hW (t) + h£ ] (t )] < log(2e) . 
Now, to define ~ 2 , we set 

(4.4) t* 2 = min inf{t > : ||P t a - u\\ + \\P t b - u\\ < e}, 

(4.5) h<£\t) = ||P| -u\\ k - ||P s a -^||, for t = kt* 2 + s, < s < t* 2 , a G /. 
Then, we let a ~ 2 b iff either a ~i b, or 

(4.6) lim sup ^ log [h^ (t) + hf ] (t )] < log(2e) . 

The equivalence relations ~3, ~4, . . . are now defined analogously. 
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The intuition behind the definitions above can be explained as follows. For 
each / G N, the time t* t is defined as the first time, at which there is a pair of 
states (a, b), which have not already been declared to be equivalent with respect 
to and for which both the distance of P t a and the distance of P t b from the 
equilibrium distribution v is small. For such a pair (a, b) the identity a ~; b is 
due to the following computation: 

(4.7) hmsup log [hg>(kt!) + hfiktl)] < hmsup ^ log(2e fc ) = loge. 

Increasing the right-hand side of the inequality defining ~; to log(2e) allows us 
to find the pairs of states (c,d), for which the distributions Pjr, Pf approach 
the equilibrium distribution v on approximately the same time scale as P t a , P[ . 
The functions he , hj are hereby, in a suitable sense, our best guess for the 
functions t i— > ||P t c — u\\, t 1— > \\Pf — u\\, if we only observe the latter on the time 
interval [0, tf] . The following proposition summarizes our findings. 

Proposition 8. The relations ~i, ~2> • • • defined above are equivalence relations 
and define a sequence of nested partitions of the state space I = {1, 2, . . . , n). 
Moreover, it holds a ~i b for any pair (a, b) which achieves the minimum in 

(4.8) t* = min inf{t > : ||P 4 a - u\\ + \\P t b - u\\ < e} 
and we have a ~ n -i b for any pair (a, b) e I 2 . 

Proof. Fix an I e N. To show that ~z is an equivalence relation, we only need 
to prove the transitivity of ~j. To this end, we observe that the inequality 

(4.9) [h® (t) + (t)} < [hg> (t) + hf (t)] + [hg> (t) + fcW (t)), t>0 
together with Lemma 1.2.15 in Chapter 1 of [H] yield 

hmsup I log [/*?(*) + /*?(*)] 

/ 1 1 

< max ( limsup - log [h®{t) + h%\t)] , limsup - log [h^\t) + h®(t)] 

> t— loo t t— >oo t 

for all (a,b,c) G J 3 . Hence, the relations a ~i 5 and b ~/ c imply together 
a ~; c. Moreover, since a 6 implies a ~; 6 by definition, and o ~j 6 

holds for each pair (a, b) G J 2 , which achieves the minimum in ( 4.8[ ) (see the 



paragraph preceeding the proposition), the number of equivalence classes under 
~z is at most n — I. This shows a ~ n _i b for all pairs (a, 6) G J 2 . □ 

5. Bounds on the global convergence to equilibrium through 

the entropy 

We have seen in section 2 that one can obtain a control on the convergence 
to equilibrium and the times of coupling by analyzing the entropy that is ac- 
cumulated by the Markov chain over time. In this section, we pursue this idea 
further and give estimates on the approach to equilibrium on subsets of macro- 
scopic size for continuous time irreducible Markov chains which are reversible 
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with respect to the uniform distribution. To this end, for each < k < 1 and 
t > 0, we introduce the set 



(5.1) 



EQ{t) 



, 1 — K 1 + K 

ae I : P t (a) E ( , 



n n 

where, with a slight abuse of notation, we wrote Pt for the law of the random 
variable X(t). For each t > 0, the set EQ{t) C / should be viewed as the part of 
the state space on which the probability measure P t is close to the equilibrium 
distribution of the Markov chain X. We are interested in lower bounds on the 
size \EQ(t)\ of such sets. 



Theorem 9. Fix real numbers < k < 1 and < a < | ; set a 

on the interval [0,a], define the function 

F(«i) = — «i(l — k) log(l — k) — (a — «i)(l + k) log(l + k) 
— (1 — (1 — k)oi — (1 + K)(a — «i)) lo 



a, and, 



1 — a — ctK + 2na-i 



1 — a 



taking non-positive values. Then, the entropy estimate 

(5.2) H(P t ) > \ogn+ max F{a 1 ) 

0<ai<a 

implies the lower bound 

(5.3) \EQ(t)\>an. 



Hereby, depending on the values of k and a, the maximum in (5.2) is attained 
at 0, a or 



a 



(1 - «)-V(2«) ((i _ a)y/l^{\ + k )( 1 +«)/( 2k ) - e (l - /^^(l - a - Sk)) 



i •■ 



2eK 



Proof. We fix numbers k and a as in the statement of the theorem and suppose 



that the inequality (5.3) does not hold. We will show that this implies that the 



entropy bound (5.2) cannot hold. To start with, we introduce the notation 
p a := Pt{a), a G /, and make the decomposition 

- Y ^ log p a - P>> lo &Pt>- 

aeEQ(t) b$EQ(t) 



(5.4) 



H(P t 



For a given value of p := J2 a eEQ(t)Pa e [A 1]> the maximum of the function 
Sb^£Q(t) Pfr l°gP6 is attained on the interior boundary of the set 

1 — K 1 + K s 



{ Yl Pb= l ~ P- Pbi{^- 



n 



n 



(5.5) 

biEQ{t) 

Indeed, this is a consequence of the fact that the function 
(5.6) (p 6 : 6^ EQ(t)) m- - ^ p logp„ 

b$EQ{t) 
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is concave and attains its maximum over the convex set {J2b^EQ(t)Pb = 1 — p} 

at the point ( n _|^ (f)| , n _^ (f) | , • • • , n -\EQ(t)\ ^ which is not an element of the 
set in ( |5.5 ). The latter statement follows from the inequalities 

(5.7) 1-P ^ 1-k 



n- |£?Q(*)| - n- |£Q(t)| n 
and 

1 ; n-\EQ(t)\~ n-\EQ{t)\ n 

with the respective second inequalities in the latter two displays being conse- 
quences of \EQ(t)\ < f . 

From the preceeding argument we conclude that at least one of the coordi- 
nates of a point in the set (5.5), which maximizes the function in (5.6), has be 
equal to or Eliminating this coordinate and proceeding with the same 
argument, we deduce that at least | — \EQ(t)\ coordinates of an optimizing 
point have to be equal to or Now, eliminating all coordinates, which 

belong to the set j^ 5 , ^p}> we deduce the following: If the inequality (5.3) 

fails, then the entropy H(P t ) cannot exceed the entropy of a probability mea- 
sure on a set of n elements, for which at least | — \EQ(t) | of its weights belong to 

the set |^^, ^T^j an d the rest of its weights is equal. In other words, denoting 
the proportion of weights, which are equal to by a% and the proportion of 
weights, which are equal to by a 2 , we have: H(P t ) < max ai)Q:2 F(ai, qj 2 ) 
with 

r \ ^ _|_ ft 

F(a 1 ,a 2 ) = — ai(l — «)log a 2 (l + K)log 

n n 

1 — (1 — k)ol-\ — (1 + k)ol<2 

- 1 - 1 - K ) ax - 1 + K )a 2 ) log \ } K - >-* 

(1 — ax — a 2 )n 

Hereby, the maximum is taken under the constraints | — a < a± + a 2 < 1, 
ai > 0, a 2 > 0, 1 - (1 - k)«i - (1 + K)a 2 > 0. 

Next, we note that F can be written as (logn) + F, where F is given by 



F(a 1 ,a 2 ) 



- «i(l - k) log(l — k) — a 2 (l + k) log(l + k) 

1 — (1 — K,)a-\ — (1 + K)a 2 
-(l-(l-K)ai-(l + «)a 2 )log — 1 



1 — a\ — a 2 

Hence, H(P t ) < (logn) +max ai>Q2 F(ai, a 2 ), where the maximum is taken over 
the region described at the end of the previous paragraph. Now, a straight- 
forward computation of the Hessian of F together with the constraint 1 — 
(1 — K)ai — (1 + k)o,2 — show that the function F is concave throughout 
the region over which its maximum is taken. In addition, the maximum of F 
over the region determined by the constraints ot\ > 0, a 2 > 0, a\ + a 2 < 1, 
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1 — (1 — «)oti — (l+/c)at2 > is attained at the point (0, 0) and is equal to 0, since 
it corresponds to the highest value of the entropy that a probability measure on 
a set of n elements can take (namely, logn). Thus, the maximum of F over the 
region of interest is attained on the segment given by the constraints a.\ > 0, 
«2 > 0, «i + a 2 = \ — a. Plugging in | — a — a.\ instead of a 2 , and recalling 
the notation a. = \ — a, we end up with H(P t ) < (logn) + max < ai <a F(ai). 
This is the desired contradiction to ( 5.2[ ). 



We also observe that the function F must be non-positive throughout [0, at], 
since the entropy of a probability measure on a set of n elements cannot exceed 
the value logn. Moreover, since the function F is concave, the function F is also 
concave. Furthermore, a straightforward computation shows that, depending 
on the values of k and a, either the derivative of the function F has no zeros on 
the interval [0, a], in which case F attains its maximum at one of the boundary 
points, or the only zero of the derivative of F on the interval [0, at] is given 
by a* (defined in the statement of the theorem), in which case F attains its 
maximum at oi\. This finishes the proof. □ 
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